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This report summarizes a meeting held in Boulder, CO USA (19-20 October 2012) on fungal 
community analyses using ultra-high-throughput sequencing of the internal transcribed spac- 
er (ITS) region of the nuclear ribosomal RNA (rRNA) genes. The meeting was organized as a 
two-day workshop, with the primary goal of supporting collaboration among researchers for 
improving fungal ITS sequence resources and developing recommendations for standard ITS 
primers for the research community. 
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ITS sequence database for fungal 
community analyses 

Sequencing-based techniques have allowed char- 
acterization of microbial communities from envi- 
ronmental samples without relying on cultiva- 
tion. Because of improvements in these tech- 
niques and of methods required to analyze the 
resulting datasets, microbial ecology has been 
advancing rapidly: microbial diversity can be 
surveyed to an extent previously unimaginable. 
The internal transcribed spacer (ITS] and other 
ribosomal RNA gene sequence regions ("rRNA" in 
the rest of this document] have been used suc- 
cessfully to profile fungal communities using 
Sanger [1] and 454 [2] sequencing. The arrival of 
ultra-high-throughput sequencing platforms [3] 
promise to offer new insights into the diversity 
and ecology of fungal communities, yet few stud- 
ies of fungal communities have employed this 
technology successfully. Progress has been ham- 
pered, in part, by the lack of a high-quality refer- 
ence database of fungal sequences for the ITS re- 
gion of the rRNA operon [4], which is now the 
most widely sequenced DNA region in fungi [5] 
and is the marker of choice for molecular identi- 
fication of most fungal taxa [6]. 

For Bacteria and Archaea, where the ribosomal 
small subunit (SSU/16S] gene is the primary 
marker in environmental sequencing, efforts 
have been made to improve the quality of the 
public reference sequence datasets, including 
GreenGenes [7] and RDP [8]. The same is true for 
more general SSU/large subunit (LSU] rRNA gene 
sequence databases, such as SILVA, which in- 
cludes all three domains of life [9]. For fungi, a 
"hand-curated" LSU sequence reference set is 
currently available, and work is underway to ap- 
ply similar methods to improve the ITS database 
[10]. Because sequencing technologies continue 
to improve, the number of ITS sequences in pri- 
mary sequence repositories such as INSDC will 
steadily increase, and quality control via hand- 
curation for specialized, publically available 
rRNA gene sequence databases will not be sus- 
tainable. 

Primary sequence repositories are already expe- 
riencing explosive growth in the number of uni- 
dentified environmental fungal ITS sequences 
[11], yet these sequences will be of limited use in 
improving our overall understanding of fungal 
diversity unless they are properly identified and 



can be placed within a phylogenetic context. 
When sufficiently closely related sequences exist, 
environmental sequences can be placed within a 
phylogenetic context today simply by aligning 
with related sequences and constructing trees. 
However, because ITS evolves rapidly, construct- 
ing phylogenies that span large taxonomic ranges 
remains extremely challenging. An even more 
important problem is that of misidentified se- 
quences (environmental sequences included] 
currently in public databases. These can lead to 
erroneous placement of unknowns, even if tree- 
based approaches are used. Therefore, identify- 
ing these errors and re-annotating in an auto- 
mated fashion is a critical challenge, especially 
for extremely large datasets where manual phy- 
logenetic analysis is not feasible due to the pres- 
ence of millions of sequence reads that corre- 
spond to "unknown" operational taxonomic units 
(OTUs]. Consequently, clean reference databases 
as well as automated phylogenetic assignment 
and analysis methods are critical needs. 



Purposes of the Meeting 

This meeting was organized in order to: 

•Facilitate communication, potential 
data exchange, and collaboration 
with the aim of improving fungal ITS 
sequence resources for the research 
community. 

•Identify suitable ITS primers for fun- 
gal community analyses using ultra- 
high-throughput sequencing. 

•Develop strategies for automated 
(and manual] database curation as 
well as the naming of environmental 
sequences and OTUs at various lev- 
els of resolution. 

•Establish a sustainable plan for ref- 
erence database development and 
maintenance. 
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Participants 

The meeting participants included researchers 
representing publicly available databases that 
contain microbial sequence data (e.g., GenBank, 
GreenGenes, RDP, SILVA] or fungi-specific re- 
sources (e.g., MycoBank, UNITE), as well as re- 
searchers currently using ultra-high-throughput 
sequencing to examine fungal communities or 
those involved in developing software, such as 
QIIME [12] and PhyloSift [13], to facilitate such 
studies. 

Activities 

The meeting was conducted as a two-day work- 
shop. The first day was devoted primarily to brief 
presentations by participants outlining their in- 
volvement in curating public sequence databases, 
developing high-throughput sequencing pipelines, 
or using ultra-high-throughput sequencing to ex- 
amine fungal diversity in environmental samples 
(e.g., air or soil]. The presentations are available 
online [14]. The second day focused on discus- 
sions related to the assembly of a high-quality ref- 
erence database of fungal ITS sequences, selecting 
ITS primers suitable for ultra-high-throughput 
sequencing, as well as methods to link ITS se- 
quences to the fungal phylogeny for automated 
curation, quality control, and phylogeny-based 
community analysis methods. 

Conclusions / Outcomes 

Ultra-high-throughput sequence pro- 

cessing/analytical pipelines, such as those imple- 
mented in QIIME, rely on de-replication of large 
sequence datasets through clustering for the crea- 
tion of reference sequence sets that can be used to 
assist in the recognition of OTUs from environ- 
mental samples. The meeting participants largely 
supported the use of the UNITE database [15,16] 
as a focal point for the development of high quali- 
ty fungal ITS reference sequence sets. UNITE cur- 
rently has implemented several desirable features 
for this task, including: 

•A comprehensive set of approximate- 
ly 300,000 fungal ITS sequences ex- 
tracted from public databases. 

•An annotation management system 
[PIutoF] that allows qualified third- 
party users to add pertinent metadata 



(e.g., on ecology or geography), 
improve the taxonomic resolution, 
tag problematic entries, or correct 
misidentifications for sequences in 
the UNITE database. 

• Global Key Annotations that permit 
visualization of sequences clustered 
at a range of similarity levels (99- 
97%], with the sequences represent- 
ing each OTU in the cluster depicted 
in an alignment. 

• Cluster centroids selected with a 
preference toward using sequences 
that are reliable (e.g., generated from 
trustworthy sources such as the As- 
sembling the Fungal Tree of Life pro- 
ject or hand-picked by taxonomic ex- 
perts), taxonomically informative 
(e.g., identified to the species-level), 
or that have particular relevance for 
taxonomy/nomenclature (e.g., se- 
quences from type specimens). 

• Improvements on the horizon (slated 
for early 2013) included labeling se- 
quences representing cluster cen- 
troids that will allow these unique se- 
quences to be tracked through time as 
clusters change, as well as options for 
downloading cluster centroid se- 
quence sets for different sequence 
similarity levels. 

With the availability of these cluster centroids, 
reference sequence sets for ultra-high- 
throughput pipelines can be directly generated 
from the UNITE database in a rapid manner. The 
meeting allowed for coordination that resulted in 
the creation of an alpha version of the UNITE ref- 
erence set to facilitate OTU picking and taxonom- 
ic assignment for fungal ITS sequence reads gen- 
erated in high-/ultra-high-throughput sequenc- 
ing runs. This reference set is now publically 
available on the QIIME website [12,17]. 

UNITE currently provides taxonomic strings based 
on classification schema culled from fungal taxo- 
nomic resources such as Index Fungorum [18] and 
MycoBank [19], which are comprehensive databases 
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for fungal names that offer expertise and resources 
for improving the quality and availability of fungal 
taxonomic information. 

Journals publishing novel fungal taxa now typi- 
cally require authors to register new names in 
MycoBank, which in turn is encouraging submis- 
sion of informative DNA sequences, such as the 
ITS region, associated with new taxa to the public 
databases. In addition to acting as sequence 
vouchers for type material, these data also have 
the potential to inform molecular studies examin- 
ing environmental samples. 

Synergistic collaboration and the flow of infor- 
mation between, and within, online taxonomic 
resources and the public sequence databases 
(that have expressed interest in using a global 
standardized taxonomy] were seen by meeting 
participants as being highly desirable. The inte- 
gration of fungal taxonomy and phylogeny was 
deemed another important consideration. The 
Assembling the Fungal Tree of Life (AFToL] pro- 
ject made considerable progress toward refining 
our understanding of the fungal phylogeny, 
which informed taxonomy for the kingdom [20]. 
Sequences generated under AFToL represent re- 
liable data that are desirable for cluster centroids 
in the fungal reference sets. New projects, such as 
the Open Tree of Life [21], hold potential as phy- 
logeny-based taxonomic resources for reference 
databases, while others, such as Fungal Barcod- 
ing [22], promise to produce additional high- 
quality sequences across a wide range of fungal 
groups for improving the reference datasets. 

Although the ITS marker allows fungal sequences 
to be resolved at the genus- or species-level, 
aligning ITS sequences across a wide taxonomic 
range is essentially unworkable. Thus, meeting 
participants also discussed strategies for anchor- 
ing fungal ITS sequences to the broader phyloge- 
ny (e.g., one constrained to the AFToL backbone] 
using SSU or LSU sequences having contiguous 
ITS reads. Given that for fungi SSU (18S] is typi- 
cally uninformative below the order- or family- 
level, LSU (26/28S] was seen as an appropriate 
marker for this task. The large subunit rRNA 
gene was considered desirable, as it has been ex- 
tensively used in fungal phylogenetic studies, it 
allows for accurate placement in the phylogeny 



both at higher and lower taxonomic ranks (e.g., 
phylum/class and family/genus, respectively], 
and many reliable AFToL sequences (spanning 
both LSU and the ITS region] are currently avail- 
able for this purpose [10,23]. Other reliable se- 
quence sources for full-length LSU/ITS reads 
were identified from complete genomes, such as 
fungal genomes project [24]. With the anchor 
tree established, fungal ITS reads produced by 
ultra-high-throughput sequencing can potentially 
be placed within a phylogenetic context for taxo- 
nomic validation, improving the taxonomy asso- 
ciated with unknown environmental sequences, 
identifying and naming novel environmental 
groups, as well as for other automated curation 
tasks. Linking ITS reads to the fungal phylogeny 
will also allow for phylogenetic metrics of com- 
munity distances (e.g., UniFrac] [25] to be used in 
beta-diversity analyses. 

Two talks on the first day of the workshop pre- 
sented preliminary data from ultra-high- 
throughput sequence surveys of soil fungi target- 
ing the ITS1 region using the primer pairs ITS1- 
F/ITS2 [see 26]. Due to spliceosomal inserts 
known to exist toward the 3' end of SSU rRNA 
gene that could interfere with priming sites and 
cause biases against groups of fungi (e.g., 
Helotiales) where such inserts exits, participants 
expressed a preference for using primers for the 
ITS2 region, for which such spliceosomal inserts 
are not known. Additional reasons for adopting 
the use of the ITS2 region marker include its 
close proximity to LSU (e.g., for anchoring to the 
phylogeny, see above], less variation in read 
length compared to ITS1, and the availability of 
data on ITS2 secondary structure [27,28] that 
can inform sequence alignments. With continual- 
ly improving read lengths of ultra-high- 
throughput sequencing platforms, full length 
ITS/LSU reads may be possible in the near future. 
Primers targeting the ITS2 region have been 
identified, and are currently being tested for use 
in ultra-high-throughput sequencing. Recom- 
mendations for fungal ITS2 as well as current 
versions of the UNITE centroid reference se- 
quence sets will be available on the QIIME web- 
site [12] in the coming year. 
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